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TBXT-TO-WI^VSFOBM COHVSaSlOK 

This Invention relat s 1: a method and apparatus 
for conv rtiag text to a wavef rm. More sp cifioally, 
it relates to the production of an output in form of an 
acoustic wave, namely synthetic speech, from an input in 
5 the form of signals repreBsating a conventional text. 

This overall conversion is very complicated and 
it is sometimes carried out in several modules wherein 
the output of one module constitutes the input for the 
next. The first module receives signals representing a 
10 conventional text and the final module produces 
synthetic speech as its output. This synthetic speech 
may be a digital representation of the waveform followed 
by conventional digital-to-analogue conversion in order 
to produce the audible output. In many eases it is 
15 desired to provide the audible output over a telephone 
system. In this case it may be convenient to carry out 
the digital -to-analegue conversion after transmission so 
that transalssion takes place in digital form. 

There are advantages in the modular stroeture, 
20 e. g. each module is separately designed and any one of 
the modules can be replaced or altered in order to 
provide flexibility, improvemeats or to oope with 
changing circumstances. 

Some procedures utilise a sequence of three 

25 modules, namely 

(A) pre-editing, 

(B) conversion of graphemes to phonemes, and 

(C) conversion of xihonemes to (digital) 
waveform. 

30 A brief description of these modules will now be 

given. 

Module [h) receives signals representing a 
conventional text, e. g. the text of this specification, 
and it modifies selected features. Thus module (A) may 
35 specify bow numb rs are processed. For example, it 
will deeid if 

"1345" 
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bacomes 

One tbre foux £ive 
Thlr^Bsn fosrcy-flve or 

One thousand three hundred and forty-five. 
5 It will be apparent that it is relatively easy to 
provide different forms of module (A), each of which is 
compatible with the subsequent modxiles so that different 
forms of output result. 

Module (B) converts graphemes to phonemes « 

10 "Grapheme" denotes data representations corresponding to 
the symbols of the conventional alaphbet used in the 
conventional manner. The text of this specification is 
a good example of "graphemes". It is a problem of 
synthetic speech that the graphemes may have little 

15 relationship to the way in which the woards are 
pronouncedi especially in languages such as English. 
Therefore/ in order to produce waveforms^ it is 
appropriate to convert the graphemes into a different 
alphabet, called "phonemes*' in this specification, 

20 which has a very close correlation with the sound of the 
words. In other words it is the purpose of module (B) 
to deal with the problem that the conventional alphabet 
is not phonetic. 

Module (C] converts the phonemes into a digital 

25 waveform which< as mentioned above, can be converted 
into an analogue format and thence into audible 
waveform. 

This invention relates to a method and apparatus 
for use in modtile (B) and this module will now be 

30 describe in more detail. 

Module (B) utilises linked databases which are 
formed of a large number of independent entries. Bach 
entry include access data which is in the form of 
representations! eg bytes, of a sequence of graphemes 

35 and an output string which contains representations, eg 
bytes £ the ph n me equival nt to the graph nes 
c ntained in the access section. A major problem of 
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grapheme/ph neme e nversion resides in the else £ 
databa&e necessary t c pe with a language. One simple^ 
and theoretically ideal, solution would be to provide a 
database so large that it has an individual entry for 
5 every possible word in the language, including all 
possible inflections of every possible word In the 
language. Clearly, given a complete database, every 
word in the input text would be individually recognised 
and an excellent phoneme equivalent would be output. It 

10 should be apparent that it is not possible to provide 
such a complete database. In the first place^ it is not 
possible to list every word in a language and even if 
such a list were available it would be too large for 
computational purposes. 

IS Although the complete database is not possible, 

it is possible to provide a database of useable 
dimension which contains, for example, common words and 
words whose pronunciation is not simply related to the 
spelling. Such a database will give excellent 

20 grapheme/phoneme conversion for the words Indxided 
therein but it will fall, 1. e. give no output at all, 
for the missing words. In any practical implementation 
this would mean an unaoceptably high proportion of 
failure. 

25 Another poBsibility uses a database in which the 

access data corresponds to short strings of graphemes 
each of which is lizxked to its equivalent string of 
phonemes. This alternative utilises a 
manageable size of database but it depends upon analysis 

30 of the input text to match strings contained therein 
with the access data in the database. Systems of this 
nature can provide a high proportion of excellent 
pronunciations with occurrences of slight and severe 
mispronunciation. There will also be a proportion of 

35 failures wh r in n output at all is pr dueed either 
b caus the analysis falls or a n eded string f 



BUG-31-2081 11:44 ^PATENT PRDUIDERS INC 



703415^20 P. 07/28 



WO 94/23423 PCT/GB94/00430 

- 4 - 

graphemes is niisslng from the access section of the 
dacal>ase. 

A final possibility Is conveniently known as a 
** default" proceedure because it is only used when 
5 preferred techniques fail. A "default" proceedure 
conveniently takes the form of "pronouncing" the symbols 
of the input text. Since the range of input symbols is 
not only known but limited (usually less than 100 and in 
many cases less than 50) it is not only possible to 
lb produce the database but its size is very small in 
relation to the capacity of modern data storage systems. 
This defatilt proceedure therefore guarantees an output 
even though that output may not be the most appropriate 
solution* Examples of this include names in which 

15 initials are used, degrees and honours; and some 
abbreviations for units. It will be appreciated that/ 
in these circumstances 1 it is usual to "pronounce" out 
the letters and on these occasions the defatat 
proceedure provides the best results. 

20 Three different strategies for converting 

graphemes to phonemes have just been identified and it 
is important to realise that these alternatives are not 
mutually exclusive. In fact it is desirable to use all 
three alternatives according to a strict order of 

25 precedence. Thus the "whole word" database is used 
first and^ if it gives an output, that output i^ll be 
excellent. When it fails "the analysis" technique is 
used which may involve a small but acceptable number of 
mis -pronunciations. Finally if the "analysis" fails 

30 the default option of pronouncing the "letters" is 
utilised and this can be guaranteed to give an output. 
Although this may not be completely satisfactory^ it 
Willi in a proportion of cases as explained above, give 
the most appropriate result. 

35 This inventi n relates to the middle option in 

the sequence outlined above. That is t say this 
invention is concerned with the analysis of the data 
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xepxesentati c rrespondin? to input text graphemes in 
order to produc an utput set of data r presentations 
being tbe phonemes corresponding to the input text. It 
is emphasised that the working environment of this 
5 invention is the coa^lete text-to-waveform conversion as 
described in greater detail above* That is to say this 
invention relates to a particular component of the whole 
system. 

According to this Invention an input secpience of 
10 bytes, eg. data representations representing a string of 
characters selected from a first character set such as 
graphemes, is dissected into sub-strings for conversion 
into an output sequence of bytes, eg data 
representations representing a string of characters 

15 selected from a second character set such as phonemes, 
wherein said method includes retrograde analysis wherein 
later occurring bytes are selected before earlier bytes 
wbereJ^y the selection of the earlier occurring bytes is 
at least partially determined by tbe previous selection 

20 of later occurring bytes. 

The method of the invention is particularly 
suitable for the processing of an input string divided 
into blocks, e. g. blocks corresponding to words, wherein 
a block is analyzed into segments beginning from the end 

25 and working to the beginning wherein the choice of 
segment is taken from the end of tbe remaining 
unprocessed string. 

Tbe invention, which is defined in the claims, 
includes tbe methods and apparatus for carrying out the 

30 methods. 

The data representations, eg bytes, utilised in 
the method according to this invention take any signal 
form which is suitable for use in computing circuitry. 
Thus tbe data representatiozis may be signals in the form 
35 of electric current (amps), electric potential (volts), 
magn tic fields, el ctric fields, or electro-magnetic 
radiation. In addition, th dat representations may 
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be stored, Includijig transient storage as part of 
processing, in a suitable storage medium, e. g. as the 
degree of and/or the orientation of magnetisation in a 
magnetic medium. 
5 The theoretical basis and some preferred 

embodiments of the invention will now be described. In 
the preferred embodiments the input signals are divided 
into blocks which correspond to the individual words of 
the text and the invention worte on each block 

10 separately; thus the process can be considered as ''word- 
by-word'* processing. 

It is now convenient to restate the reguirement 
that it is not necessary to produce an output for every 
one of the blocks because, as described above, the whole 

15 system includes fxirther modules to deal with such 
failures^ 

As a preliminary, it is convenient to illustrate 
the theoretical basis of the invention by considering 
the structtxre of words in the English language and by 

20 commenting on the structures of a few specific words. 

This analysis uses the distinction usually identified as 
"vowels" and "consonants" • For mechanical processing it 
is necessary to store two lists of characters. One of 
these lists contains the characters specified as 

25 "vowels" and the other list contains those characters 
designated as "consonants". All characters are, 
preferably, included in one or other of the lists but, 
in the preferred embodiment, the data representations 
corresponding to -T" are included in both lists- This 

30 is because conventiozial English spelling sometimes 
utilises the letter "Y" as a vowel and sometimes as a 
consonant. Thus the first list (of vowels) contains a, 
e, i, o, u and y, whereas the second list of coxwonants 
contains b, c. d, f, g, h, j, k, 1, m, n, p, q, r ,s, t, 

35 V, w, X, y, z. Th fact that "Y" appears in both lists 
means that the condition "not vowel" is different fjc m 
the condition "c nsonant". 
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The primary purpose of the analysis is to split 
a bl ck £ data r presentations, 1 . a word, into 
"rimes" and "onsets"- It Is important to realise that 
the analysis uses linked databases which contain the 
5 grapheme equivalents of rimes and onsets linked to their 
phoneme equivalents. The purpose of the analysis is not 
merely to split the data into arbitrary sequences 
representing rimes and onsets but into sequences which 
are contained in the database. 

10 A rime denotes a s tring of one or more 

characters each of which is contained In the list of 
vowels or such a string followed by a second string of 
characters not contained in the list of vowels. An 
alternative statement of this requirement is that a rime 

15 consists of a first string followed by a second string 
wherein all the characters contained In the first string 
are contained in the list of vowels and the first string 
must not be empty and the second string consists 
entirely of characters not found in the list of vowels 

20 which the proviso that the second string may be empty. 

An onset is a string of characters all of which 
are contained In the list of consonants. 

The analysis requires that the end of a word 
shall be a rime. It is permitted that the word contains 

25 adjacent rimes, but it Is not permitted that it contains 
adjacent onsets. It has been specified that the end of 
the word must be a rime but it should be noted that the 
beginning of the word can be either a rime or a 
consonant; for instance "orange" begins with a rime 

30 whereas "pear" begins with an onset. 

In order to illustrate the underlying theory of 
the invention four specimen words^ arbitrarily selected 
from the English language, will be displayed and 
analysed into their rimes and onsets. 

35 
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CATS 

rime '*ats" 
onset ••c" 

5 It is to be expected that "ats" will be listed 

as a rime and "c** will be listed as an onset. Therefore 
replacing each by its phoneme equivalent will convert 
eats'* into pbonecoes. 

It should be noted that the rime ''ats** has a 
10 first string consisting of the single vowel -a" and a 
second string which consists of two non-vowels namely 
"t" and "s*'. 

STREET 
15 rime -eet" 
onset "str". 

In this case the first string of the rine contains two 
letters namely "ee" and the secoxid string is a single 
non-vowel "t". The onset consists of a string of three 
20 consonants. 

The onset "str^ and the rime ^eet" should both 
be contained in the database so that phoneme equivalents 
are provided. 

25 HIGH 

rime "igh" 
onset "h" 

In this example the rime "igh" is one of the 
arbitrary of sounds of the English language but the 
30 database can give a correct conversion to phonemes. 

PQHftTO BPgCTMBV 

HIGHSTREST 
second rime "eet* 
second onset *str° 
35 first rime "igh" 
first onset ••h". 
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Clearly the w rd "highstr ©t** is a c mpound of 
th previous two examples and itB aiialysis is very 
similar to these two examples. However^ tbere is an 
iraportfitat extra requirement in that it is necessary to 
5 recognise that there is a break between the fourth and 
fifth letters in order to split the word into "high** and 
''Street". This split is recognised by virtue of the 
contents of the database. Thus the consonant string 
^ghstr" is not an onset in the English language and, 

10 therefore, it xilll not be in the database so that it 
cannot be recognised. Furthermore the string 'hstr** 
will not be in the database. However, "str" la a common 
onset in English and it should be in the database. 
Therefore "str*^ can be recognised as an onset and '^str" 

15 is the later part of the string "gbstr". Once the end 
of the string has been recognised aa an onset the 
earlier part is identified as part of the preceding rime 
and the word ""bigh' can be split as described above. It 
is the purpose of tbis example to illustrate that the 

20 splitting of an internal string of consonants is 
sometimes important and that the split is achieved by 
the use of the database. 

We have now given a description of the theory 
which underlies the techniques of the Invention and it 

25 is now appropriate to indicate how this is carried into 
effect using automatic computing equipment, which is 
illustrated in the accompanying diagrammatic drawing. 

The computing equipment operates on strings of 
signals, eg. electrical pulses. The smallest unit of 

30 computation is a string of 8igneU.s corresponding to a 
single grapheme of the original text. For convenience 
such a string of signal will be designated as a "byte" 
no matter how many bits it contains in the "byte**. 
Originally the term "byte" indicated a sequence of 6 

35 bits. Since 8 bits provides count of 255 this is 
sufficient to accommodate most alphabets. However, the 
"byte" d es not necessarily contain 8 bits. 
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The processing describ d below is carried out 
block-by-block wherein ach block is a string of on or 
more bytes: Each block corresponds to an individual 
word (or potential word, since it is possible that the 
5 data will contain blocks which are not translatable so 
that the conversion must fail). The purpose of the 
method is to convert an input block whose bytes 
represent grapheraes into an output block whose bytes 
represent phonemes. The method works by dividing the 

10 input block into sub-strings, converting each sxxb-string 
in a look-up table and then concatenating to produce the 
output block. 

The operational mode of the computing eg\xipment 
has two operation procedures. Thus It has a first 

15 procedure which includes two phases and the first 
procedxire is utilised for identifying byte . strings 
corresponding to rimes. The second procedure has only 
one phase and it is used for identifying byte strings 
corresponding to onsets* 

20 As indicated in the drawing, the computing 

equipment comprises an input buffer 10 which holds 
blocks from previous processing until they are ready to 
be processed. The input buffer 10 is coxmected to a 
data store 11 and it provides individual blocks to the 

25 data store 11 on demand* 

An ingportant part of the computing equipment is 
storage means 12. This contains programming 

instructions and also the databases and lists which are 
needed to carry out the processing. As will be 

30 described in greater detail below, stora9e means 12 is 
divided into various functional areas. 

The data processing equipment also includes a 
working store 14 which is required to hold sub-sots of 
bytes acq[uired from data store 11, for processing and 

35 for comparis n with byt strings held in databases 
contained in the storag 12. Single bytes, ie. signal 
strings corresponding to individual graph m s, are 
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transferred from the input buffer 10 to the working 
St re 14 via check store 13 which has capacity for one 
hyte. The hyte in check store 13 is checked against 
lists contained in data storage 12 before transfer to 
5 the working store 14. 

After successful matching with items contained 
in the working storage 12 strings are transferred from 
the working store 14 to the output store IS. For use 
when matching fails the equipment includes means to 
10 return a byte from the working store 14 to the data 
store 11* 

In addition to other areas^ eg for program 
instructions^ the storage means 12 has four major 
storage areas. These areas will now be identified. 

15 First the storage means has areas for two 

different lists of bytes. These are a first storage 
area 12. 1 which contains which contains a list of bytes 
corresponding to the vowels and a second storage area 
12. 2 which contains a list of bytes corresponding to the 

20 consonants* (The vowels and the consonants have been 
previously identified in this specification). 

The storage means 12 also contains ^o areas of 
storage which constitute two different, and substantial, 
linked databases. First there is the rime database 12. 3 

25 which is further divided into regions designated 12. 31, 
12. 32, 12. 33, etc. Each region has an input section 
containing bytes strings corresponding to "rimes'* in 
graphemes and, as shown in the drawing, this includes 
12.31 containing 'ATS", 12.32 containing »SET", 12.33 

30 containing "IGH** and many more sections not illustrated 
in the drawing. 

The storage means 12 also contains a second 
major area 12.4, which contains byte strings equivalent 
to the onsets. As with the rimes, the onset database 

35 12.4 is also divided into many regions. For example, it 
comprie s 12.41 containing "C", 12.42 contaiziing "STR** 
nd 12.43 containing **H". 
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Each of the input section (of 12.3 and 12.4) is 
linked to an output secti a which contains a string of 
bytes corresponding to the content of its input section. 

It has already been stated that the operational 
5 method includes two different procedures. The first 
procedure utilises storage areas 12. 1 and 12* 3 whereas 
the second procedure utilises storage areas 12. 2 and 
12» 4. It is emphasised that the areas of the database 
which are actually used are defined entirely by the 
10 procedure in operation. The procedures are used 
alternately and procedure number 1 is used first. 

RPBCiriC BXMTPLB 

It will be noted that this specific example 
15 relates to the word selected as the fourth specimen in 
the description given above. Therefore its rimes and 
onsets are already defined and the specific example 
explains how these are achieved by mechanical 
computation. 

20 The analysis begins when the input buffer 10 

transfers tbe byte string corresponding to the word 
"taCHSTREET" into the data store 12. Thus, at the start 
of the process^ the iinportant stores have the contents 
&8 follows: • 

25 S^QRB CQMTgMT 

11 HIGH5TREET 

13 

14 

15 

30 (The symbol indicates that the relevant store is 

empty). 

The analysis begins with the first procedure » 
because the analysis always begins with the first 
procedure. As mentioned above, the first procedure uses 
35 storage regions 21. 1 and 12. 3. The first procedure has 
two phases during which bytes are trans f err d f r m th 
data stor 11 t the working st re 14 via the cb ck 
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Store 13. The first phase continues for so 1 ng as the 
bytes are not found tn st rage r gion 12. 1. 

The procedure is a retrograde which means that 
it works from the back of the word and therefore the 
5 first transfer is "T" which is not contained in region 
12. 1- The second transfer is 'E- which is contained in 
the region 12. 1 and therefore the second phase of the 
first procedure is initiated. This continues for as 
long as the byte in working store 14 is matched in 12. 1 
10 therefore the second "B" is transferred but the check 
fails when the next byte »R" is passed. At this stage 
the state of the various stores is as follows. 

XI HI6HST 

15 13 R 

14 EST 

15 

The contents of the working store 14 are used to 
access storage area 12. 3 and a match is found in region 
20 12. 32. Thus the natch has succeeded and the content of 
the working store 14, namely "EOT" is transferred to a 
region of the output store IS so that the state of the 
various stores is as follows. 

STORE pOWTEMT 
25 11 HIGHST 

13 R 
14 

15 BET 

It will be noticed that the first rine has been found 

30 mechanically. 

As mentioned above, the non-matching of "R" in 
the cheek store 13 terminated the first performance of 
the first procedure. The analysis continues but the 
second procedure is now used because the two procedures 

35 always alternate. The sec nd proc dure utilis s the 
storage regi ns 12. 2 and 12. 4. The byte correspoxiding 
to "R" in cheek store 13 now matches b cause r gi n 12. 2 
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is now in use and this byte is contain d therein. 
Therefore -R" is transferred to the working store 14 and 
the second procedure continues so long as the byte in 
check store 13 matches. Thus the letters "T% "H" 
5 and "G" are all transferred via the check store 13. At 
this point the byte corresponding to -1" arrives in the 
check store 13 and the check fails because the byte 
corresponding to "X" is not contained in storage region 
12.2, Since the check fails this performance of the 
10 oecond procedure terminates. The contents of the 
various stores ares - 
STOftS CQMTBMEC 
11 

13 -1" 

15 14 "C3HSTR° 

15 -BET- 

The second procedure will attempt to match the 
content of the working store 14 with the database 
contained in 12. 4 but no match will be achieved, 

20 Therefore the second procedure continues with its 
remedial part wherein the bytes are transferred back to 
the data store 11 via the check store 13. At each 
transfer it is attempted to locate the content of the 
working store 14 in storage area 12. 4. A match will be 

25 achieved when the letters 6 and H have been returned 
because the string equivalent to ''STR" is contained in 
region 12. 42. Having achieved a match the content of 
the working store is put out into a region of the output 
store 15. At this point the content of the various 

30 stores is as follows: - 

BTQRg coMTOimy 
11 "HIG" 
13 "H" 
14 

35 15 "STB" and "EET" 

The sec nd procedure was terminat d by finding the match 
so the analysis now go s back to the first procedure and 
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more particularly to tb first phase of the first 
pr eedure. In this way the letters "H" nd "G" are 
transferred to the working store 14 and the first phase 
ends. The second phase passes "I" and it terxninates 
5 when "H" is transferred to the cheelc store 13- At this 
stage the various stores have contents as follows: - 

11 

10 14 "IGH- 

15 "STR" and "EST". 

The first procedure now attempts to match the content of 
the working store 14 with the database in the storage 
area 12. 3 and a natch is found in region 12, 33, 

15 Therefore the content of the working store 14 is 
transferred to a region of the output store 15. 

The anedysis now continues with the second 
procedure and the letter "K" [in the cheek store 13) is 
located in storage region 12. 2 (note that this region is 

20 ziow in use because the analysis has now gone back to the 
second procedure) » The analysis can now terminate 
because the data store 11 has no further bytes to 
transfer and the content of the working store^ namely> 
"H", is found in region 12.43 of the storage means 12. 

2S Thus "H" is transferred to the output store 15, which 
contains the correct four strings found by mechanical 
analysis. 

The necessary output strings having been 
located, it is only necessary to convert them using the 

30 fact that storage areas 12, 3 and 12* 4 are linked 
databases. Bach region not only has the strings now 
contained in the output store, but each region has 
linlced output regions containing strings corresponding 
to the appropriate phonemes. Therefore each string in 

35 the output stor is us d to aec ss its appr priate 
region and hence pr duce the necessary output. The 
final step merely utilises a 1 ok-^up tabl and this is 
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possible to cause th importanl: analysis has been 
conpleted. 

As indicated above, the identified strings serve 
as acceas to the linked database and, in a simple 
5 system, there is one output string for each access 
string. However, pronunciation sometimes depends on 
context and improved conversion can be achieved by 
providing a plurality of outputs for at least some of 
the access strings. Selecting the appropriate output 

10 stream depends upon analysing the context of the access 
stream, eg. to take into account the position in the 
word or what follows or what proceeds. This further 
complication does not affect the invention^ which is 
solely concerned with the division into appropriate 

15 sections. It merely complicates the look-up process. 

As was explained above, the invention Is not 
necessarily required to produce an output because, in 
the case of failure, the complete system contains a 
default technique, eg. providing a phoneme equivalent 

20 for each grapheme. In order to complete the description 
of the technique^ it is considered desirable to provide 
a brief indication of the circumstance in which this 
failure occurs and use of a default technique is 
required. 

25 Pniltifg Mna^ 1- 

The first failure mode will occur when the 
content of the data store does not contain a vowel which 
implies that it is not a word, he always, the analysis 
starts by using the first procedure and, more 

30 specifically, the first phase of the first procedure and 
this will continue so long as there is no match with the 
first list 12. 1. Since the string and data store 11 
contains no match, the first phase will continue until 
the beginning of the word and this indicates that there 

35 is a failure. 
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This failure occurs when: - 
(i) the Becond procedure is in use; 
(il) the beginning of the word is reached and; 
5 (ill) there is no match for the content of the working 
store 14 in the database 12. 4. 
This contrasts with failure to match during the 
middle of the word which iinplies that a vowel is 
contained in the check store 13. Failure at this Btage 

10 permits the tetuming of bytes for later analysis by the 
first procedure and there is no failure, at least not at 
this point in the analysis. When the beginning of the 
word is reached, there is no possibility of further 
analysis and hence the analysla has to fall. 

IS n**^ g^lliiga Mode 

The third fadlure mode occurs when the first 
procedure is in use and it is not possible to match the 
contents of the working store 14 with a string contained 
in the database 12. 3. Under these circumstanees the 

20 first procedure will transfer bytes back to the check 
store 13 and the data store 11 and this transfer can 
continue until working store 14 becomes empty and the 
analysis also fails. 

In the second failure mode, it was explained 

25 that the second procedure is allowed to return bytes to 
input for later analysis by the second procedure. 
However, the transferred bytes must be matched at some 
time and this means during the next performance of the 
first procedure. The third failure mode corresponds to 

30 the case where it is not possible to achieve the later 
match. 

Thus the method of the invention provides 
analysis of a data string into segments which can be 
converted using look-up tables. It is not necessary 
35 th t the analysis shall succe d in very case but, giv n 
g d databases, the meth d will w rk very frequently and 
nhanee the p rformance of a c iqplet system which 
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comprises the th r modules necessary f r text to spe eh 
eonversiozi. 
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MAIMS 



10 



' 15 



20 



25 



30 

r 



1. A method of processing an input signal 
representing a string of characters selected from a 
first character set so as to identify sub-strings for 
conversion into an output signal representing a string 
of characters selected from a second character set, 
wherein said method divides said input signal into sub- 
strings by retrograde analysis, said retrograde analysis 
comprising the selection of later occurring portions of 
the input signal before earlier occurring portions 
thereof wherein the prior selection of a later portion 
at least partially defines the selection of an earlier 
occurring portion; said later occurring portions being 
contained in one of said sub-stringe and said earlier 
occurring portion being contained in a different one of 
said sub-strings. 

2. A method according to claim 1, herein said 
input signal is composed of a string of bytes each of 
said bytes corresponding to a character of the first 
character set. 

3. A method according to either claim 1 or claim 2, 
wherein the method is preformed in conjunction with 
signal storage means which includes first, second, third 
and fourth storage areas wherein: - 



(i) the first storage area oontains a 
plurality of bytes each of which 
represents a character selected from the 
first character set; 

(ii) the secoxid storage area contains a 
plurality of bytes each of which 
represents a character selected from the 
first charaet r set, th t tal content of 
said sec nd storage area b ing different 
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t 

from the total content of said first 
storage area; 

(iii) the third storage area contains strings 
consisting of one or more bytes 

5 representing characters of the first 

character set wherein the or the first • 
byte of each string is contained in .the 
first storage area; and 

(iv) the fourth storage area contains strings 
10 of one or more bytes the or each of which 

is contained in the second storage area. 

4. A method according to claim 3, wherein the input 

signal is divided into blocks and processing of at least 
IS some of said bloOcs comprises: - 

(a) identifying an internal string of 
consecutive bytes each of which is 
contained in the second storage area saitl 
string being immediately proceeded by a 
20 predecessor byte contained in the first 

storage area and immediately followed by 
a successor byte contained in the first 
storage area; 
(b> identifying the longest end string of said 
25 internal string with strings contained in ' 

the fourth storage area; 
(o) defining an initial portion of said 
internal string being the residue of said 
internal string after the separation of 
30 the end string defined in (b) and 

combining said initial string with the 
predecessor bit specified in (a) and ^ 
identifying a string including said 
predecessor bit and said initial portion 
35 with a string stored is said s cond 

storage area. 
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5^ Am thod of converting an input signal 

repres nting a string of characters s 1 ctefl from the 
first character set into an equivalent signal 
representing a sting of characters selected froni the 
second character set; which nethod comprises identifying 
sub-strings by a method according to any one of the 
preceding claims and converting sub-strings by a linked 
database which has Input seetions each of which contains 
one of said sub-strings each Input section being linked 
to en output section which contains the output 
equivalent of the content of the input section. 



6. A method according to claim 5, wherein the input 

signal is divided into input blocks and wherein each 
15 block is separately converted wherein at least some of 
said blocks are converted as a whole without sub- 
division and at least some of the said blocks are 
converted by a method according to claim 5. 
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